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Abstract 

A fast new algorithm is proposed for numerical computation of (approximate) D-optimal 
designs. This cocktail algorithm extends the well-known vertex direction method (VDM; 
Fedorov 1972) and the multiplicative algorithm (Silvey, Titterington and Torsney, 1978), 
and shares their simplicity and monotonic convergence properties. Numerical examples show 
that the cocktail algorithm can lead to dramatically improved speed, sometimes by orders of 
magnitude, relative to either the multiplicative algorithm or the vertex exchange method (a 
variant of VDM). Key to the improved speed is a new nearest neighbor exchange strategy, 
which acts locally and complements the global effect of the multiplicative algorithm. Possible 
extensions to related problems such as nonparametric maximum likelihood estimation are 
mentioned. 

Keywords: D-optimality; experimental design; hybrid algorithm. 

1 Introduction 

This paper studies numerical methods for computing D-optimal designs (approximate theory; 
see Kiefer 1974, Pukelsheim 1993, and Atkinson, Donev and Tobias, 2007). Given a parametric 
model, the problem is to find an allocation of weights to the design points x±,. . . ,x n (which 
encode the explanatory variables at specific values) so that the determinant of the Fisher in- 
formation matrix of the parameter is maximized. We focus on the linear model and discuss 
possible extensions in Section 5. Two strategies for this classical problem are the vertex direc- 
tion method (VDM; see Fedorov 1972 and Wynn 1972) and the multiplicative algorithm (Silvey, 
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Titterington and Torsney, 1978). Both VDM and the multiplicative algorithm are simple it- 
erative strategies that converge monotonically, i.e., the determinant criterion never decreases 
along the iterations. Though easy to implement, VDM or the multiplicative algorithm can be 
slow, and various strategies have been devised to remedy this. In particular, Bohning (1986) 
proposes the vertex exchange method (VEM) as a more effective variant of VDM. Variants of 
the multiplicative algorithm are considered by, for example, Titterington (1978), Mandal and 
Torsney (2006), and Dette, Pepelyshev and Zhigljavsky (2008). 

In this work we propose a cocktail algorithm for efficient computation of D-optimal designs. 
As the name suggests, this is based on a combination of several strategies, including VDM and 
the multiplicative algorithm. A new ingredient that contributes significantly to its effectiveness, 
however, is a nearest neighbor exchange strategy, which is intended to complement the multi- 
plicative algorithm. Two desirable effects of nearest neighbor exchanges are i) elimination of 
multiple bad support points, and ii) quick apportionment between very similar support points. 
Both compensate for the potentially slow convergence rate of the multiplicative algorithm, while 
the former also reduces its computing time per iteration. Operationally, the nearest neighbor 
exchanges are as simple to implement as VDM, VEM, or the multiplicative algorithm. The 
speedup brought in by such a simple modification, however, can be dramatic. 

In Section 2, after a brief review of the D-optimal design problem on finite design spaces, we 
describe the multiplicative algorithm, VDM, and VEM. Then we introduce the nearest neighbor 
exchange strategy and formally define the cocktail algorithm. Section 3 establishes that the 
cocktail algorithm is monotonically convergent. Section 4 presents numerical illustrations with 
several regression models. The cocktail algorithm compares favorably with the multiplicative al- 
gorithm, VEM, and general optimization methods such as conjugate gradient and quasi-Newton. 
Section 5 concludes with a discussion on possible extensions. 

2 Algorithms for D-optimal designs 

We focus on the important case of a finite design space X = {xi, . . . ,x n } C R m , which may 
be the result of discretizing an underlying continuous space. An approximate design (Kiefer, 
1974) is any probability vector w = (w\, . . . , w n ) <G Cl, where & denotes the closure of Q = 
{w : Yli=i w i = 1) Wi > 0}. The value Wi represents the proportion of units an experimenter 
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assigns to Xj. Approximate designs allow Wi to be real numbers; some rounding is usually used 
to convert w to a design with a finite sample size. Suppose the response from a unit assigned 
to xi is modeled as 

y\(x i ,8)~N(xJd,a 2 ), 

where 8 (m x 1) is the parameter of interest, and suppose responses from different units are 
independent. Then the Fisher information matrix for 8 is proportional to 

n 

M(w) = WjXjxJ . 
i=l 

A design w* is D-optimal if it maximizes 

cj)(w) = logdetM(u;), w G £l. 

Equivalently, a D-optimal design minimizes the determinant of the variance matrix of the best 
linear unbiased estimator of 8. The D-criterion is among the most widely used optimal design 
criteria. 

We shall describe several iterative algorithms for finding D-optimal designs. These differ in 
the choice of the starting value and the updating rule tv® — > vj( t+1 \ The following common 
convergence criterion, however, will be used throughout. Define 

d(i,j,w) = xj M~ 1 (w)xj, d(i,w) = d(i,i,w). 

Note that d(i,w) = d(f)(w)/dwi. Alternatively, d(i,w) — m is a directional derivative d(f>({l — 
8)w + 5ei) / d5\$=o + where the probability vector puts all the mass on Xj. 
Convergence criterion: 

mT 1 max d (i,w^A < 1 + e, (1) 

l<i<n \ J 

where e is a small positive constant. 

This convergence criterion can be motivated from the general equivalence theorem (Kiefer 
and Wolfowitz, 1960), part of which states that w is D-optimal if and only if 

mT x max d(i,w) = 1. 

i<i<n y ' 

The general equivalence theorem thus allows us to check whether a given weight allocation 
w = (wi, . . . ,w n ) is D-optimal; it is crucial to both analytic and numerical approaches to the 
problem. 
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2.1 The multiplicative algorithm 

The multiplicative algorithm (MA) refers to a well-known proposal of Silvey et al. (1978). 
Algorithm I (the multiplicative algorithm) 

Starting value. Choose £ Q,, i.e., wf 1 > for all i. 
Updating rule. 

w f +1) =w? ) m- 1 d(i,w ( - t) ^ , i = l,...,n. (2) 
Let us denote the mapping ([2]) as u>(* +1 ) = MA(w^ t '). Equivalently, ([2]) can be written as 

w p»= w ? *ir W) , /8 ? — . w 

YTj=\ w ) (wW) /dwj 

which highlights X/i^i = ^> ^ na ^ w^ t+1 ^ is correctly normalized. Heuristically, ([3]) simply 
adjusts the weights w so that proportionally more weight is put on Xi if the gain in the objective 
function by a slight increase in W{ (i.e., d<j)(w) j dtVi) is larger. 

Algorithm I has generated considerable interest; see, for example, Titterington (1976, 1978), 
Silvey et al. (1978), Mandal and Torsney (2006), Harman and Pronzato (2007), and Dette et 
al. (2008). The latter three papers are concerned with improving the multiplicative algorithm 
based on principles different from the exchange strategies reported here. Mandal and Torsney 
(2006) consider applying a class of multiplicative algorithms to clusters of design points for better 
efficiency. Harman and Pronzato (2007) study methods to exclude nonoptimal design points so 
that the dimension of the problem is reduced. Dette et al. (2008) propose a modification of 
Algorithm I which takes larger steps at each iteration but still maintains monotonic convergence 
(see also Yu 2010b). Another relevant work is Yu (2010a), which formulates Algorithm I as an 
iterative conditional minimization procedure and is mainly concerned with theoretical properties. 
Algorithm I is an important ingredient in our proposed cocktail algorithm. 

2.2 VDM and VEM 

The vertex direction method (VDM) is defined by the following iteration w — > w new . 
VDM: Select 1 < i max < n such that 

d(i ma x,w) = max d(i,w), (4) 

Ki<n 
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and set w new = VDM(w) as 



new 



(1 - 5)wi, i / '•max > 

(l-5)Wi + 5, i = i m ax, 



w new 



where 5 S [0, 1] is such that det M{w new ) is maximized. The maximizing 5 is available in closed 
form: 

s _ d(i max ,w) /m-1 

d(imax,w) - 1 

See Fedorov (1972) for the underlying rationale. Plainly, we move w in the direction of a 
design point toward which the directional derivative of <j> is the greatest. VDM is a steepest 
ascent strategy in this sense. With a slight abuse of notation, we shall occasionally write 
w new _ Y£)M(i max ,w) instead of w new = VDM(w) to emphasize the index i m ax- 

Closely related to VDM is a general exchange step w — > w new for any two design points 
%j ; %k ? j t 1 k. 

VE(j, k): Set w new as 

Wi, i^{j,k}, 
Wi-6, i = j, 
Wi + 5, i = k, 

where 5 G [— Wk, Wj] is chosen such that det M(w new ) is maximized. Following Bohning (1986), 
it can be shown that the maximizing 5 is 

5 = mm{ujj, max{-wj,, 5*(j,k)}} , 

where 

k) = d(k,w) - d(j,w) 

U ' > 2(d(j,w)d(k,w)-d 2 (j,k,w)y 1 1 

Plainly, VE(j, k) performs an optimal exchange of mass between Xj and Xk- The exchange is 
optimal in the sense of maximal increase in the determinant of the information matrix. When 
5 = Wj, all the mass assigned to Xj (which has a smaller d(j,w), indicating that it should 
carry less weight) is transferred to x^', similarly when 5 = —Wk- We shall denote this mapping 
w -> w new by w new = VE(j, k, w). 

Remark. The denominator in ([5]) is nonnegative by Cauchy-Schwarz. It becomes zero 
only when one of constant multiple of the other, in which case we define 5*(j,k) as 



+oo or — oo according as d(k,w) > d(j,w) or d(k,w) < d(j,w). If Xj + xt = 0, then both 
the numerator and the denominator in ([5]) become zero, and det M (w new ) is constant as a 
function of 5 £ [—Wk,Wj]; we set S*(j,k) as an arbitrary constant (say zero) in this case. These 
contingencies rarely arise in practice. 

The vertex exchange method (VEM) of Bohning (1986) performs an optimal exchange be- 
tween two special design points. 

Algorithm II (the vertex exchange method) 

Starting value. Choose € Cl such that det M(w^) > 0. 

Updating rule. Select i m i n and i m ax such that 

d(w,™ (t) ) = mm {d(i,w^) : wf ] > o} , (6) 
d Umai,^*') = max d (i, w®) . (7) 

Set 

^ — VE yi max > i mini U)^ ^ 

That is, VEM finds 

imin (resp. imax) such that d(i,w) is minimized (resp. maximized), and 
then performs an optimal transfer of mass from Xi min to Xi max (it is also required that i m in have 
nonzero mass to supply to imax)- Hence we may view VEM as a steepest ascent strategy in its 
choice of the two indices i m in an d imax- We shall numerically compare VEM with our proposed 
algorithm in Section 4. 

2.3 Nearest neighbor exchanges 

A key ingredient in our proposed algorithm is a nearest neighbor exchange strategy, which can be 
motivated as follows. Intuitively, Algorithm I (the multiplicative algorithm) may have difficulty 
apportioning the mass between adjacent design points. Consider two design points Xi and Xj 
that are close together as measured by some distance metric in R m . Then d(i, w) ~ d(j, w), and 
according to (j2j), we have 



w\ 



_w? ) d(i,wV) 
wf +1) wfd(j,w(t)) w f 



That is, the relative proportions between xi and Xj barely change from iteration to iteration. 
Another way of putting it is that, if Xi is a support point of the optimal design, then it would 
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take many iterations before Algorithm I can significantly reduce the mass on those Xj which are 
adjacent to X{ but are not support points. 

A simple remedy is to add nearest neighbor exchanges (NNEs) to Algorithm I. NNEs are 
easy to define when there exists a natural ordering in the design space. An example is 

X = {xi = (l,f(i/n)) T , i = l,...,n], (8) 

where / is a continuous function on [0, 1] representing a single quantitative predictor. In such a 
case Xi and Xj are close whenever is small. Given the current iterate let i± < ■ ■ ■ < i p+ i 
denote the indices of the support points of w®. We may consider performing vertex exchanges 
between Xi j and Xi j+1 for j = 1, . . . ,p in turn, i.e., 

w (t+j/p) = y E W (t+U-1)/P)j t j = 1, . . . , J), 

where fractional superscripts denote intermediate output. We refer to the mapping — > 
w( t+1 \ which consists of p sub-steps, as the set of nearest neighbor exchanges. Note that non- 
support points of are excluded, i.e., Xi j+1 is a "nearest neighbor" of x^ in the support of 
only. 

This intuitively appealing prescription depends on a natural ordering of x\. Sometimes there 
is no single natural ordering, e.g., when the design space encodes two or more factors. Selecting 
an ordering that best captures the neighborhood structure is therefore an interesting problem. In 
our numerical examples (Section 4), we explore another approach, which dynamically determines 
the nearest neighbors at each iteration. Specifically, let \\xj — x^W denote the distance between 
design points Xj and Xk, as measured by the L\ norm. The choice of the metric does not make 
much difference in our experience. 

NNE: Let i\, . . . ,i p +i be the elements of {i : wf^ > 0} where p + 1 is the number of 
support points of ro^. For each ij, j = 1, . . . ,p, let i* be any index i G {ij+i, ■ ■ ■ ,"ip+i} such 
that \\x{ — Xij || is minimized. Perform vertex exchanges between x\. and x«* for j = 1, ... ,p in 
turn, i.e., 

w (t+j/ P ) = VE ^. ji * jU ,(*+0--i)/p)). 

Again, non-support points of are excluded. We shall denote the composite mapping 
„,(*) _> w (t+i) as w (t+i) = NNE(w^). 
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Remark. The index i* is defined as a minimizer of ||:cj — Xi j \\ over i 6 {ij+i, ■ ■ ■ , 2 p +i}, 
rather than over i G {ii,... , ij-i, ij+i, ■ ■ ■ , ip+i}, to avoid possible redundancies. If we adopt 
the latter definition, then for two points that are nearest neighbors of each other, we would have 
two exchange steps in one iteration between these same points. 

2.4 The cocktail algorithm 

NNE has a serious problem as a stand-alone algorithm. By definition, we have wf +1 ' 1 = once 
w!f = 0, i.e., the point Xi remains outside of the support set. Algorithm I, which suffers from 
the same problem, circumvents it by assigning positive initial mass to each design point. NNE 
may result in wf +l ^ = even if wf > 0. The problem persists when we combine NNE and 
Algorithm I. 

An easy solution is to add in the updating rule of VDM. By definition, a VDM step can 
put some mass on a design point that was assigned zero mass previously. We define the cocktail 
algorithm as a combination of VDM, NNE, and Algorithm I. 

Algorithm III (the cocktail algorithm) 

Starting value. Choose w^ ' £ Cl such that detM(u/°)) > 0. 

Updating rule. Perform an iteration of VDM, the nearest neighbor exchanges, and then an 
iteration of Algorithm I. That is, let 

w (t+i/3) = VDM ( W W), w^ t+2 ^ = NNE(w^ t+1 ^), = MA(w^ t+2 ^), (9) 

where again fractional superscripts indicate intermediate output. 

An added benefit of ([9]) is that NNE helps keep the number of support points of ?i/* +2 / 3 ) 
small, so that each iteration of t</ i+1 ) = MA(w( t+2 /'^) costs little time, as we need not update 
the coordinates of ?z/*+ 2 / 3 ) that are zero at the multiplicative step. 

We may consider using a VEM step instead of the VDM step above. The resulting algorithm 
is similarly effective (numerical comparison omitted) , although the convergence proof of Section 3 
does not seem to extend easily to this alternative cocktail algorithm. 

3 Monotonic convergence 

Several algorithms considered here have an appealing monotonic convergence property, i.e., as 
t t oo, det M(w^) increases to sup wen det M(w). Monotonic convergence of Algorithm I is 
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well-established (see Titterington, 1976; Pazman, 1986; Dette et al., 2008; Yu, 2010a). Bohning 
(1986) has given a proof of the monotonic convergence of Algorithm II, i.e., VEM. Theoretical 
results concerning algorithms related to VEM and VDM can be found in Atwood (1976) and 
Wu (1978), for example. 

The monotonicity of Algorithm III is immediate since VDM, the nearest neighbor exchanges, 
and Algorithm I are all monotonic. That Algorithm III converges is a consequence of this 
monotonicity and the global convergence nature of VDM. 

Theorem 1. Assume the nxm matrix X = (x\, . . . , x n ) T has full rank m. Then Algorithm III 
converges monotonically starting from any € f2+ where $7 + = {w € f2 : det M(w) > 0}. 

Proof. See the Appendix. □ 

4 Numerical examples 

We illustrate the effectiveness of the cocktail algorithm by comparing it with Algorithms I and 
II for a few regression models. VDM by itself is very slow and is excluded from the comparisons. 
All algorithms are implemented in R, and the source code is available upon request from the 
author. The main program contains fewer than 150 lines of code, showing that Algorithms I— III 
are indeed easy to implement. We also consider general-purpose algorithms such as Nelder- 
Mead, conjugate gradient (CG), and quasi-Newton (specifically, the Broyden-Fletcher-Goldfarb- 
Shanno, or BFGS method). These are known to be powerful for solving various high-dimensional 
optimization problems. However, they are not the most effective for the D-optimal design 
problem considered here. 

For Algorithms I— III, both the number of iterations and the computer time (as measured by 
the R function system.time()) are reported. An iteration of the cocktail algorithm is counted as 
one iteration each of VDM, NNE and MA. It may seem that the iteration count comparison would 
favor the cocktail algorithm unfairly. Careful inspection, however, shows that the computing 
time per iteration for the cocktail algorithm is spent mainly by the VDM step, because the NNE 
and MA steps only work with design points that receive positive mass in the current iteration, 
and this set of support points is typically much fewer than n. Consequently the computing costs 
per iteration are actually comparable for VEM (i.e., Algorithm II) and the cocktail algorithm. 
At any rate, the reader is reminded to focus on the computing time comparisons. 

9 



For VEM and the cocktail algorithm, the starting design is the uniform design over 
a set of approximately 2m randomly sampled support points. This is intended to ensure that 
det M(u/°)) > while keeping the number of support points small. VEM tends to take more 
iterations if the initial design has more support points, since it can remove at most one bad 
support point per iteration. It is observed that the cocktail algorithm is relatively insensitive 
to the initial number of support points. The multiplicative algorithm is always started at the 
uniform design over all n points, as it cannot afford to exclude any design point a priori (see, 
however, Harman and Pronzato 2007). 

We consider the design spaces 

Xi(n) = ja* = (e~ s \ Sie- s \e- 2si ,Sie- 2si ) T , 1 < i < n} , 
X 2 (n) = jxj = (1, Si, sf, sf, sf) T , 1 < % < nj , 

X 3 (n) = {xi = (e~ s \ s i e- s \e- 2s \s i e- 2s %e- 3s \Sie- 3s \e-^\s i e-' ls *) T , 1 < i < n\ , 

where = 3i/n, i = 1, . . . , n. The space X\(n) represents the linearization of a compartmental 
model (see, e.g., Atkinson et al. 1993, and Dette, Melas and Wong 2006) 

y\(s, 6) ~ d ie ~^ s + 6 3 e- e * s + N(0, a 2 ) 

at 02 = 1 and 64 = 2 (the underlying design variable is s G [0, 3] on a grid of n evenly spaced 
points). The space X%{n) is similar to X\{n) but has a parameter of higher dimension. We 
include polynomial regression as represented by A^(n), although analytic results are well known 
in this case (see, e.g., Pukelsheim, 1993). For Sj = i/k, r\ = 2ijk — 1, % = 1, . . . , k, we also 
consider 

X 4 (k 2 ) = |x (i _ 1)fc+i = (1, r h r 2 , sj, riSj) T , 1 < i, j < /c| . 

This last example represents a response surface with a nonlinear effect and an interaction. 

Each algorithm is stopped when either the convergence criterion ([T]) is met with e = 10~ 6 , 
or the number of iterations exceeds 10000. For large design spaces, some of the experiments are 
aborted because the algorithm under consideration takes too much time, especially compared 
with the cocktail algorithm. We have also considered less stringent convergence criteria such as 
e = 10~ 5 , and the results (omitted) are similar. 
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As is evident from Tables 1-4, the cocktail algorithm is a substantial improvement over 
both the multiplicative algorithm and VEM. Because of the random starting values, results for 
VEM and the cocktail algorithm vary from replication to replication; the qualitative comparison, 
however, remains the same. The tables report the median computing time (iteration count) over 
three replications for VEM and the cocktail algorithm. For X\{n) and X^n), VEM is much faster 
than the multiplicative algorithm; the situation is less clear for ^(n). The cocktail algorithm 
improves upon the better of the two, often by large factors. MA and VEM tend to take more 
iterations for larger n, i.e., when the design space becomes finer, although peculiar exceptions 
do exist (e.g., VEM in Table 2). The cocktail algorithm seems insensitive to n concerning the 
number of iterations, at least for the design spaces considered. 

We also consider Nelder-Mead, conjugate gradient (CG), and quasi-Newton algorithms, 
which are readily available via the R function optim(). Quasi-Newton here refers to the popu- 
lar Broyden-Fletcher-Goldfarb-Shanno (BFGS) method, while conjugate gradient uses Fletcher- 
Reeves updates. These are tested on the same design spaces and compared with Algorithms I III. 
To make the optimization problem unconstrained, we use the substitution Wi = zf / J2j=i z ji * = 
1, . . . , n, and operate on z rather than w (see Atkinson et al. 2007). The starting value is = 1. 
We use numerical derivatives for BFGS and conjugate gradient. It should be noted that these 
general purpose algorithms are not guaranteed to find a global maximum. In several cases, 
despite extensive tuning, we have been unable to obtain an output that satisfies our convergence 
criterion (pQ) with e = 10 -6 . Nelder-Mead, for example, often stops at sub-optimal solutions; so 
do BFGS and conjugate gradient for X^{n). In other cases, and with moderate n, we record 
the computing time of BFGS and conjugate gradient in Tables 1, 2 and 4. BFGS seems faster 
than conjugate gradient in these cases and is sometimes competitive with the better of VEM 
and MA (e.g., for ^(20) or ^(50)). However, it definitely takes more time than the cocktail 
algorithm. We note that one must be cautious when making such quantitative comparisons 
between algorithms with very different structures, because details of implementation may af- 
fect the relative performance considerably. Nevertheless, this limited experience makes us more 
confident in recommending the cocktail algorithm, which is simple and fast, and has a global 
convergence guarantee. 
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Table 1: Computing time (in seconds) and number of iterations (in parentheses) for the mul- 
tiplicative algorithm (MA), the vertex exchange method (VEM), and the cocktail algorithm, 
for design space X\(n). Also included is the computing time for conjugate gradient (CG) and 
quasi-Newton (BFGS) methods. 





n = 20 


n = 50 


n = 100 


n = 200 


n = 500 


CG 


14.5 


111.3 


1328.4 






BFGS 


8.82 


39.8 


293.4 






MA 


14.3 (4239) 


63.7 (8015) 


147+ (10000+) 


307+ (10000+) 


762+ (10000+) 


VEM 


0.17 (58) 


1.43 (241) 


23.1 (2113) 


206+ (10000+) 


555+ (10000+) 


cocktail 


0.07 (8) 


0.11 (9) 


0.25 (13) 


0.36 (13) 


0.96 (16) 



Table 2: Computing time (in seconds) and number of iterations (in parentheses) for design space 
X 2 {n). 





n = 20 


n = 50 


n = 100 


n = 200 


CG 


4.61 


164.5 


220.3 




BFGS 


1.83 


14.7 


116.1 




MA 


3.38 (947) 


10.1 (1292) 


76.3 (4105) 


427+ (10000+) 


VEM 


6.32 (1371) 


30.3 (4747) 


4.04 (302) 


252+ (10000+) 


cocktail 


0.31 (24) 


0.65 (25) 


0.21 (10) 


0.63 (21) 



Table 3: Computing time (in seconds) and number of iterations (in parentheses) for design space 
X 3 (n). 





n = 20 


n = 50 


n = 100 


n = 200 


MA 


3.94 (609) 


24.2 (2371) 


44.7 (3016) 


382+ (10000+) 


VEM 


0.80 (182) 


10.7 (1291) 


37.6 (3242) 


127.2 (5324) 


cocktail 


0.72 (22) 


1.56 (32) 


1.34 (42) 


1.21 (29) 
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Table 4: Computing time (in seconds) and number of iterations (in parentheses) for design space 
X A (n). 





n = 20 2 


n = 50 2 


n = 100 2 


n = 200 2 


CG 


1545.9 








BFGS 


657.3 








MA 


25.2 (430) 


993.8 (2302) 






VEM 


8.01 (159) 


195.8 (702) 


94.6 (98) 




cocktail 


0.63 (13) 


3.94 (14) 


17.6 (14) 


74.1 (16) 



5 Discussion 

Although we focus on D-optimal designs for linear models, the basic idea is not limited to either 
D-optimality or linear models. The multiplicative algorithm can be more general and is known 
to be monotonic for a large class of optimality criteria (Silvey et al. 1978, Yu 2010a). For 
vertex exchange strategies with optimality criteria other than D-optimality, we may not have a 
closed form solution for the maximizing step- length similar to ([5]). But such one-dimensional 
maximization problems presumably can be handled by standard tools such as Newton's method. 
The idea of nearest neighbor exchanges is generic. Overall, although the implementation may 
not be as simple, there is no conceptual problem extending the cocktail algorithm to other 
optimality criteria or to nonlinear problems. 

The optimal design problem is closely related to several other statistical problems (Haines 
1998) such as mixture estimation (Lindsay 1983) and nonparametric estimation with censored 
data. There exists a large literature on efficient computation of the nonparametric MLE of the 
distribution function with censored data; see, for example, Wellner and Zhan (1997), Jongbloed 
(1998) and Wang (2008). The cocktail algorithm can be extended to this case and is quite 
competitive; see Yu (2010c). 
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Appendix: Proof of Theorem [j] 

Let be a sequence generated by Algorithm III, and let u>W be a convergent subsequence 
tending to some w * . Monotonicity and w^ ' S show that € f2+ for all t. Hence w* € fJ+. 
Let u>(* +1 / 3 ) be defined as in ([9]). By passing through another subsequence if necessary, we may 
assume u>(*j +1 / 3 ) converges to some w € $7+. Moreover, we may assume that the VDM steps 

w (%+i/3) = VDM(w^) 
are all performed with the same index h — imax 

as in ([I]) , since at least one of the n indices will 

occur infinitely often. 

The mapping w new = VDM(k, w) is continuous on 

{w € fi+ : d(k,w) > d(i,w), 1 < i < n}. 

By letting j — > oo in uife+V 3 ) = VDM(w^), we get w = VDM(k,w*). Because each step of 
VDM, NNE, or MA is monotonic, and t j + 1 < £j+i, we have 

det M(w {t ^) < detM(w^ +1/3) ) < det M(w^ +1) ) < det M(w^ +l) ). 

Letting j — >• oo yields detM(u;*) = det M(w). However, inspection shows that the mapping 
VDM(k,w) strictly increases det M(itf), unless d(k,w) = m, in which case w = VDM(k,w). 
Hence w = w* and d(k,w*) = m. Since k = i m ax, the general equivalence theorem implies that 
w* is a global maximizer of det M(u>) on Q + . That is, all limit points of are D-optimal. 
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